Prediction and functional analysis of prokaryote lysine acetylation site by incorporating six types of features into Chou's general PseAAC

J Theor Biol. 2019 Jan 14:461:92-101. doi: 10.1016/j.jtbi.2018.10.047. Epub 2018 Oct 23.

Abstract

Lysine acetylation is one of the most important types of protein post-translational modifications (PTM) that are widely involved in cellular regulatory processes. To fully understand the regulatory mechanism of acetylation, identification of acetylation sites is first and most important. However, experimental identification of protein acetylation sites is often time consuming and expensive. Thus, it is popular that predicts PTM sites by computational methods in recent years. Here, we developed a novel method, ProAcePred 2.0, to predict species-specific prokaryote lysine acetylation sites. In this study, we employed an efficient position-specific analysis strategy information gain method to constitute position-specific window of acetylation peptide, and then incorporated different types of features and adopted elastic net algorithm to optimize feature vectors for model learning. The prediction model achieved area under the receiver operating characteristic curve value of six species in training datasets, which are 0.78, 0.752, 0.783, 0.718, 0.839 and 0.826, of Escherichia coli, Corynebacterium glutamicum, Mycobacterium tuberculosis, Bacillus subtilis, S. typhimurium and Geobacillus kaustophilus, respectively. And our method was highly competitive for the majority of species when compared with other methods by using independent test datasets. In addition, function analyses demonstrated that different organisms were preferentially involved in different biological processes and pathways. The detailed analyses in this paper could help us to understand more of the acetylation mechanism and provide guidance for the related experimental validation. A user-friendly online web service of ProAcePred 2.0 can be freely available at http://computbiol.ncu.edu.cn/PAPred.

Keywords: Elastic net; Information gain; Post-translational modifications; Predictor.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Acetylation
  • Bacteria / genetics
  • Bacteria / metabolism
  • Bacterial Proteins / metabolism*
  • Binding Sites
  • Computational Biology / methods
  • Datasets as Topic
  • Lysine / metabolism*
  • Prokaryotic Cells / metabolism*
  • Protein Processing, Post-Translational*
  • Species Specificity
  • Support Vector Machine*

Substances

  • Bacterial Proteins
  • Lysine